Improving Compiler and Run-Time Support for Irregular Reductions
نویسندگان
چکیده
Compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems or relying on the sharedmemory interface supported by software DSMs. Run-time systems gather/scatter nonlocal results (e.g., CHAOS, PILAR) while software DSMs apply local reductions to replicated buffers (e.g., CVM, TreadMarks). We introduce LOCALWRITE, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates the need for buffers or synchronized writes, but may replicate computation. We investigate the impact of connectivity (node/edge ratio), locality (accesses to local data) and adaptivity (edge modifications) on their relative performance. LOCALWRITE improves performance by 50–150% compared to using replicated buffers. Gather/scatter using CHAOS generally provides the best performance, but LOCALWRITE can outperform CHAOS for applications with low locality or high adaptivity. We also discover the flushupdate coherence protocol can improve performance by 15– 25% for software DSMs over an invalidate protocol.
منابع مشابه
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buuers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce LocalWrite, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates th...
متن کاملEfficient compiler and run-time support for parallel irregular reductions
Many scienti®c applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buers, then combined using synchronization. We develop LOCALWRITE, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eli...
متن کاملcient Compiler and Run - Time Support for ParallelIrregular
Many scientic applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buuers, then combined using synchronization. We develop LocalWrite, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, elim...
متن کاملSoftware Support For Improving Locality in Advanced Scientific Codes
Programs can achieve good performance only if they possess data locality, This paper describes our proposal to develop and evaluate software support for improving locality for advanced scientific applications for both sequential and parallel machines. The basic premise is that both compile-time analyses and sophisticated run-time systems are necessary. Run-time systems are needed because many p...
متن کاملSoftware Support For Improving Locality in Scientific Codes
We propose to develop and evaluate software support for improving locality for advanced scientific applications. We will investigate compiler and run-time techniques needed to achieve high performance on both sequential and parallel machines. We will focus on two areas. First, iterative PDE solvers for 3D partial differential equations have poor locality because accesses to nearby elements in h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998